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DETAILED ACTION 

Response to Arguments 

1 . Applicant's arguments filed 7/1 6/2008 have been fully considered but they are 
not persuasive. Komori et al. disclose a method of and system for adapting speech 
models using noise extracted during silence periods in the speech signal (the operation 
of figure 2). The only feature that Komori et al. lacks is using "speech echo" instead of 
noise to adapt speech models. However, Takiguchi was relied upon for the teaching of 
using "reverberant speech" to adapt speech models (page 128, left column). And since 
"reverberant speech" is considered the same as "echo speech", it would have been 
obvious to one of ordinary skill in the art at the time of invention to modify Komori et al. 
by incorporate the teaching Takiguchi. In fact, the system of Komori et al. would 
inherently be able to adapt speech models with any available type of adaptive signal, 
whether it is noise or echo speech, as long as adaptive signal is available for 
adaptation. 

2. Instead of using noise to generate noise model for adapting speech models, 
Takiguchi et al. teach using "reverberant speech" to adapt speech models. Therefore, 
the combination of Komori et al. and Takiguchi would teach all the claimed limitations 
including generating "echo speech model", "adding the echo speech model" to adapt 
speech models. 
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3. In response to applicant's arguments regarding Komori et al. fail to 
disclose/suggest "a storage area for storing a feature quantity acquired from a speech 
signal for each frame storing portions for storing acoustic model data and language 
model data, respectively", as explained in previous office action that any computing 
system inherently includes buffer memory and storage memory for handling the input 
speech signal for processing by the system. Sound card inherently includes memory 
and/or buffer memory (sound analysis section 102 in figure 2 inherently includes a 
buffer memory for temporarily storing the received speech signal for processing). In 
fact, before speech recognition operation, speech features must first be extracted from 
received speech, and are preserved or stored for further processing in matching or 
comparing with speech models to determine a best match (referring to elements 203 
and 105 in figure 1, speech HMM 4; language model or grammar or dictionary). 



Claim Objections 

4. Newly added limitations in claim 7 needs clarification. The clean speech models 
are already existed in the speech recognition system. It is not clear why the speech 
models are generated again and from what it is generated from. In this office action, 
examiner treated the step of "generating a speech model" as accessing speech models 
in the speech recognition system for adaptation. 



Claim Rejections - 35 USC §112 

5. The following is a quotation of the first paragraph of 35 U.S. C. 112: 
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The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

6. Claim 7 is rejected under 35 U.S.C. 112, second paragraph, as being indefinite 
for failing to particularly point out and distinctly claim the subject matter which applicant 
regards as the invention. The newly added limitation regarding "generating a speech 
model" is not clear. The clean speech models are already existed in the speech 
recognition system. It is not clear why the speech model is generated again and from 
what it is generated from. Claim 7 needs clarification. 

7. The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

8. Claim 7 recites the limitation "said storing portion" in line 7. There is insufficient 
antecedent basis for this limitation in the claim. 

9. Claim 8 recites the limitation "said sum" in line 6-7. There is insufficient 
antecedent basis for this limitation in the claim. 

10. Claim 12 recites the limitation "said storing portion" in line 7. There is insufficient 
antecedent basis for this limitation in the claim. 
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Claim Rejections - 35 USC § 103 

1 1 . The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

12. Claims 7-10, 12, and 15-16 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Komori et al. (US 5956679) in view of Takiguchi et al. (IEEE 
Publication). 

13. Regarding claim 7, Komori et al. disclose a speech recognition method for 
causing a speech recognition device configured to include a computer to perform 
speech recognition the method causing the speech recognition device to execute the 
steps of: 

storing in a storage area a feature quantity acquired from a current speech signal 
for each frame (sound analysis section 102 in figure 1 inherently includes a buffer 
memory for temporarily storing the received speech signal for further processing; also 
referring to col. 5, lines 22-32); 

reading from a storage portion a noise signal acquired immediately prior to the 
current speech signal to be processed at the current time point to generate noise model 
data {steps 401-402 in figure 2; noise intervals are extracted from the input speech 
signal and is processed in step 401 in figure 2; the sound analysis section 102 in figure 
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1 inherently includes a buffer memory for temporarily storing the noise for further 
processing by steps 401-402 in figure 2; also referring to col. 5, lines 49-57); 

processing a speech model stored in a storing portion using a noise adaptation 
model generation portion for generating noise model data from a noise signal acquired 
immediately prior to the current speech signal to be processed at the current time point 
{step 402 in figure 2 generating noise HMM from the noise intervals extracted from the 
input speech signal); 

generating a speech model affected by intra-frame echo influence using acoustic 
model data and an intra-frame characteristic (treated as accessing clean speech 
models; step 203 in figure 2); 

adding the noise model data to the speech model affected by intra-frame echo 
influence to generate an adapted acoustic speech model data and store it in a storage 
area (step 403 in figure 2; adapting speech models using noise model; or referring to 
the operation of figure 7; adding noise model to the clean speech model to generate 
adaptive speech models); and 

processing said feature quantity, said adapted acoustic model data, and 
language model data stored in a storing portion to generate a speech recognition result 
of the current speech signal (recognition process in steps 303-305 and 104-106 in fig 2). 

Komori et al. fail to specifically disclose an "echo speech" in place of noise. 
However, Takiguchi et al. teach "echo speech" (page 128, left column, "reverberant 
speech" is considered the same as an "echo speech"). 
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Since Komori et al. and Takiguchi et al. are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Komori et al. by incorporating the teaching of 
Takiguchi et al. in order to improve speech recognition accuracy. 

14. Regarding claim 12, Komori et al. disclose a computer-readable program 
embodied in a computer readable storage medium for causing a computer to execute 
the speech recognition method comprising the steps of: 

storing in a storage area a feature quantity acquired from a current speech signal 
for each frame (sound analysis section 102 in figure 1 inherently includes a buffer 
memory for temporarily storing the received speech signal for further processing; also 
referring to col. 5, lines 22-32); 

reading from a storage portion a noise signal acquired immediately prior to the 
current speech signal to be processed at the current time point to generate noise model 
data {steps 401-402 in figure 2; noise intervals are extracted from the input speech 
signal and is processed in step 401 in figure 2; the sound analysis section 102 in figure 
1 inherently includes a buffer memory for temporarily storing the noise for further 
processing by steps 401-402 in figure 2; also referring to col. 5, lines 49-57); 

processing a speech model stored in a storing portion using a noise adaptation 
model generation portion for generating noise model data from a noise signal acquired 
immediately prior to the current speech signal to be processed at the current time point 
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(step 402 in figure 2 generating noise HMM from the noise intervals extracted from the 
input speech signal); 

processing a speech model stored in a storing portion using an echo adaptation 
model generation portion for generating echo speech model data from a speech signal 
acquired immediately prior to the current speech signal to be processed at the current 
time point (step 402 in figure 2 generating noise HMM from the noise intervals extracted 
from the input speech signal) and using a noise model data to generate adapted 
acoustic speech model data and store it in a storage area (step 403 in figure 2; adapting 
speech models using noise model; or referring to the operation of figure 7; adding noise 
model to the clean speech model to generate adaptive speech models; the adaptive 
models are inherently preserved or stored for the recognition step); 

processing said feature quantity, said adapted acoustic model data, and 
language model data stored in a storing portion to generate a speech recognition result 
of the current speech signal (recognition process in steps 303-305 and 104-106 in fig 2). 

Komori et al. fail to specifically disclose an "echo speech" in place of noise. 
However, Takiguchi et al. teach "echo speech" (page 128, left column, "reverberant 
speech" is considered the same as an "echo speech"). 

Since Komori et al. and Takiguchi et al. are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Komori et al. by incorporating the teaching of 
Takiguchi et al. in order to improve speech recognition accuracy. 



Application/Control Number: 10/849,724 Page 9 

Art Unit: 2626 

15. Regarding claim 8, Komori et al. further disclose the speech recognition method 
according to claim 7, wherein the step of generating said adapted acoustic model data 
further comprises the step of: a model data area transforming portion reading sum 
calculated by said adding portion (figure 7, transformation form HMM to linear); and 
transforming cepstrum acoustic model data into linear spectrum acoustic model data 
(referring to figure 7). 

16. Regarding claim 9, Komori et al. further disclose the speech recognition method 
according to claim 8, further comprising a step of: causing an adding portion to read and 
add said linear spectrum acoustic model data and said echo speech model data to 
generate a maximum likelihood echo prediction coefficient (referring to figure 7; adding 
noise model to clean speech model). 

1 7. Regarding claim 1 0, Komori et al. fail to specifically disclose the speech 
recognition method according to claim 9 wherein the step of transformation into said 
linear spectrum acoustic model data comprises a step of causing said adding portion to 
add the cepstrum acoustic model data of said acoustic model and cepstrum acoustic 
model data of an intra-frame transfer characteristic to generate the speech model 
affected by intra-frame echo influence. However, Takiguchi et al. teach the step of 
causing said adding portion to add the cepstrum acoustic model data of said acoustic 
model and cepstrum acoustic model data of an intra-frame transfer characteristic to 
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generate the speech model affected by intra-frame echo influence (referring to figure 3 
or equation 7 on page 129). 

Since Komori et al. and Takiguchi et al. are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Komori et al. by incorporating the teaching of 
Takiguchi et al. in order to improve speech recognition accuracy. 

18. Regarding claim 1 5, Komori et al. further disclose the speech recognition method 
according to claim 7 wherein said storing comprises steps of: transforming a received 
current speech signal into a digital signal (A/D converter 101b in figure 1); and storing 
the transformed signal with amplitude associated with at time frame (inherently included 
in the sound analysis section 102 in figure 2 since features extracted from the input 
speech includes frequency domain features). 

19. Regarding claim 16, Komori et al. further disclose the speech recognition method 
according to claim 9 wherein said echo prediction coefficient is calculated for at least 
one of a particular signal receiving device, a level of recognition efficiency, a level of 
recognition speed, and each state of a Hidden Markov Model (noise HMM models in 
figure 2 derived from noise features including spectral features). 



Allowable Subject Matter 
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20. Claim 1 1 and 14 are objected to as being dependent upon a rejected base claim, 
but would be allowable if rewritten in independent form including all of the limitations of 
the base claim and any intervening claims. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to HUYEN X. VO whose telephone number is (571)272- 
7631 . The examiner can normally be reached on M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on 571-272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/HuyenXVo/ 10/20/2008 
Primary Examiner, Art Unit 2626 
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